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Take Home Message Preview 





1.Cloud prototypes are underway to tackle 
the Volume challenge of Big Data... 


2....But advances in computer hardware or 
cloud won't help (much) with Variety 


3. Interoperability standards, conventions, 
and community engagement are the key 
to addressing Variety 
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Big Data Indicators 





EOSDIS FY2015 Metrics 


Unique Data Products 9,462 
Distinct Users of EOSDIS Data and Services = 


Average Daily Archive Growth 16 TB/day 
Total Archive Volume (as of Sept. 30, 2015) 14.6 PB 
End User Distribution Products 1B 


End User Average Daily Distribution Volume 32.1 TB/day 





EOSDIS Cloud Prototypes 
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Analytics Support 





Archive Cloud Prototypes 





| Benefits from Archive in the Cloud 
pee » Cost savings for storage of Big Data? 


> Avoid data downloading and local data 
mgmt 
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End Users 


> Alaska Satellite Facility Web Object Storage prototype 
> Distribute Sentinel radar data from Amazon storage 


> Global Imagery Browse Service in the Cloud 
> Ingest and Archive management prototype 


Cloud Analytics Prototypes 





Benefits from Cloud Analytics 
Analyze data at scale 







Archive 
Mgmt 


Analyze datasets together easily 


Avoid data downloading and local mgmt 


Analysis support toolbox to 
attract users to cloud 
analytics 


> > Community open source tools 
DAAC-developed tools 





Other NASA 
Cloud-Based 
Data Analytics 
& Processing 
Services 


Cloud analytics examples and 
recipes 











Initial cross-DAAC proof of concept 
in progress based on Python + 
Jupyter Hub 





Terra Incognita 














1. Vendor Lock-in 
2. Future storage costs 


3. Uncapped egress costs 





4. Security Restrictions 
5. Network trust 
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Distinct Science Products Distributed 
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Instrument Variety 





“Vertical distance / km 





Satellite Instrument “Footprints” 





Example Imaging Footprint 


Best Total Ozone Solution 
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Wallow Fire 


(Near Springerville, Arizona) MODIS Terra 
9 hours later 


227 279 332 385 437 490 
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| Example Limb Scanning Footprint 
500 - ee - a 
gS 3 , So 
0 = SSS ————— ss 
500+ a 
-1000. ; | ; 
-4000 -3500 -3000) -2500) = -2000 -1500 = -1000 = -500 0 500 LOOO 1500) «©2000 «6 2500) 3000. 3500) 4000 


‘Horizontal’ distance / km 


Microwave Limb Scanner (from Algorithm Theoretical Basis Document, Livesey ant8Wu, 1999) 
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Same Instrument, Different Satellite 








Aerosol Optical Depth 550 nm (Dark Target) Aerosol Optical Depth 550 nm (Dark Target) 





Aerosol Optical Depth 550 nm (Dark Target) (1) Aerosol Optical Depth 550 nm (Dark Target) (1) 
rp — a“ | 


0.0 0.3 0.6 0.9 Le Lo 0.0 0.3 0.6 0.9 1 Vm 
Data Min = 0.0, Max = 2.7 Data Min = -0.1, Max = 3.2 


Aerosol Optical Depth 550 nm (Dark Target) 


Aqua - Terra 
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Same Instrument+Satellite, Different Algorithm (iu. 


MODIS on Aqua 
Aerosol Optical Depth 





Dark Target Algorithm 














Deep Blue Algorithm 








Processing Levels 





1000 


AIRS data for 2011-08-11 


100 


Level 1B 
Calibrated radiance at a pixel 


Radiance 


Level 2 
Carbon monoxide for one scene 


Level 3 
Global carbon monoxide for one night 
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SurfAirTemp_D 


SurfAirTemp_D 
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SurfAirTemp D () 





275.000 280.600 286.200 291.800 297.400 303.000 
Data Min = 212.125, Max = 306.688, Mean = 286.636 





SurfAirTemp D () 
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Data Min = 212.125, Max = 306.688, Mean = 286.636 


Time Aggregation 


Time Averaged Map of Aerosol Optical Depth 555 nm daily 0.5 deg. [MISR MILSDAE v4] 
over 2009-09-21, Region 180W, 90S, 180E, 90N 
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Time Averaged Map of Aerosol Optical Depth 555 nm monthly 0.5 deg. [MISR MILSMAE v4] 
over 2009-Sep, Region 180W, 90S, 180E, 90N 
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- Selected date range was 2009-09-21 - 2009-09-21. Title reflects the date range of the granules that went into making this result. 





Aerosol Optical Depth at 555 
nm from Multi-angle Imaging 
Spectro-Radiometer 


Daily 


Monthly 
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Spatial Aggregation 
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SeaWiFS Deep-Blue 
Aerosol Optical Depth 
2006-10-06 
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Aerosol Optical Depth 550 nm (1) 
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Data Min = 0.02, Max = 2.33 


Data Formats 


e Self-Describing API-Based 
m Hierarchical Data Format (HDF) 
m network Common Data Form (netCDF) 
e Additional conventions 
m HDF-EOS 
m Climate-Forecast coordinates 
e Other Standards 
m Gridded Binary (GRIB) 
m ICARTT (Airborne) 
e Binary 


e ASCII 





Solutions to the Variety Problem 


1.Interoperable discipline-focused DAACs 
2.Common Metadata Repository 
3,.0PeNDAP* data services 


4.Community engagement 


“Open-source Project for a Network Data Access Protocol 
22 





Discipline-Focused 
Distributed Active Archive Centers (DAACs) — 
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Different DAACs have different 
“-Soheres of Influence” 


DAAC Atmo Hydro Cryo Litho § Bio | Anthropo 
Alaska Satellite Facility J J 

Atm. Sciences Data Center J 

Crustal Dynamics Data Info Sys ri 


Global Hydrology Resource Ctr 

Goddard Earth Sciences DISC J 

Land Processes DAAC JV Vv 
Li and Atm Archive & Dist Sys J 

Nat. Snow Ice Data Ctr DAAC we 

Oak Ridge Nat Lab DAAC 

Ocean Biology DAAC 

Physical Oceanography DAAC 


Socioeconomic Data Arch Ctr J 








Sub-Specialty 
SAR 


Space geodesy 


Weather events 


MODIS, VIIRS 


Field experiments 
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The Common Metadata Repository presents a consistent 
catalog for discovery of data from multiple DAACs 





Earthdata 
Search Client 





metadata 


Unified Metadata Model 
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One Metadata System to rule them all, 

One Metadata System to find them, 

One Metadata System to bring them all 

And in cyberspace bind them 25 


OPeNDAP 





® Open-source Project for a Network Data 
Access Protocol 


e High-performance network access protocol 
for complex science data 


e Well-supported in Earth science 
community tools 


O Free: Panoply, IDV, McIDAS-V, nco.... 
O Commercial: ArcGIS, Matlab, IDL... 
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OPeNDAP* access to data smoothes out format 
heterogeneity and supports subsetting 


Analysis Tool 


netCDF Library 





Hierarchical Hierarchical network 
Data Format Data Format Common ASCII Files flat binary 


version 4 version 5 Data Form files 


Files Files Files 





“Although the Hyrax implementation is shown, other OPeNDAP servers such as GrADS Data Server 
and THREDDS Data Server have similar capabilities but different architectures. o7 


Data transformation options of several kinds 
can help with Variety and Volume 








Data transformation applies 
fundamental changes and 
conversions to attributes of the 
Original data to suit the application 
_ requirements of end-users 










Spectral \\\ 
Subsetting }/: 







Spatial 
Subsetting 





PESO S Courtesy of B. Ramachandran, 
| este MODAPS/LAADS 


Big Earth Data Initiative (BEDI) 





e OSTP-driven multi-agency effort 
e Focus on datasets in Societal Benefit Areas 
e Several interoperability aspects... 


BEDI in EOSDIS 





® Improve dataset consistency across EOSDIS 
O Metadata in Common Metadata Repository 
O Data in OPeNDAP 

@® Improve machine access to EOSDIS 
O Developers’ portal 


@e How To Access Common Metadata Repository 
@e How to Access OpENDAP-served Data 


O OPeNDAP performance 
O OPeNDAP use with Cloud storage 
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Community Engagement on Big Data 





e Earth Science Information Partners (ESIP} 
O Variety: Clusters on Discovery, Information Quality 
O Volume: Clusters on Earth Science Data Analytics and Cloud 
Computing 
e Earth Science Data Systems Working Groups 
O Formed of DAACs, ACCESS and MEaSUREs award winners 


O Variety: Working Groups on Dataset Interoperability, Search 
Relevancy 


O Volume: OPeNDAP Best Practices, Cloud Computing 
e User Needs efforts 

O DAAC User Working Groups 

O American Customer Satisfaction Index survey 

O EOSDIS User Needs Analysis group 
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Big-Data-Community Engagement 


e Big Data Theme for both ESIP 2016 
Meetings 


e Co-Convening AGU 2016 session on Big 
Data Analytics 

e Program committee for [EEE Workshop on 
Big Data in Earth and Planetary Sciences 

e ESA's Big Data from Space (BiDS) workshops 


O “Improving Earth Science Data Discoverability And 
Use Through Metadata Relationship Graphs, 
Virtual Collections, And Search Relevancy” 


User Needs from Community Sources 
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33 


Take Home Message 





1. Cloud prototypes are underway to 
tackle the Volume challenge of Big Data... 


2....But advances in computer hardware or 
cloud won't help (much) with Variety 


3. Standards, conventions, and community 
engagement are the key to addressing 
Variety 


Backup Slides 
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OPeNDAP Enhancements from the Co 
Big Earth Data Initiative 





e More OPeNDAP for EOSDIS data 
eMore aggregation along time for data in 
OPeNDAP 


Olmproved performance for aggregation in 
Hyrax 





aggregation 


